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Abstract. The synchronization problem over the special orthogonal group SO(d) consists of 
estimating a set of unknown rotations Ri, R2, ■ ■ ■ , R n from noisy measurements of a subset of their 
pairwise ratios R~ Rj. The problem has found applications in computer vision, computer graphics, 
and sensor network localization, among others. Its least squares solution can be approximated by 
cither spectral relaxation or semidefinite programming followed by a rounding procedure, analogous 
to the approximation algorithms of Max-Cut. The contribution of this paper is three-fold: First, we 
introduce a robust penalty function involving the sum of unsquared deviations and derive a relaxation 
that leads to a convex optimization problem; Second, we apply the alternating direction method to 
minimize the penalty function; Finally, under a specific model of the measurement noise and the 
measurement graph, we prove that the rotations are exactly and stably recovered, exhibiting a phase 
transition behavior in terms of the proportion of noisy measurements. Numerical simulations confirm 
the phase transition behavior for our method as well as its improved accuracy compared to existing 
methods. 
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1. Introduction. The synchronization problem over the special orthogonal group 
SO(d) of rotations in R d 

SO{d) = {Re R dxd : R T R = RR T = I d , dct R=l} (1.1) 

consists of estimating a set of n rotations € SO(d) from a subset of 

(perhaps noisy) measurements Rij of their ratios R~ 1 Rj. The subset of available 
ratio measurements is viewed as the edge set of an undirected graph Q — (V, £ ), with 
|V| = n. The goal is to find R\, . . . , R n that satisfy 

Rj '/.'., * //.,- far(*,j)e£. (1.2) 

Synchronization over the rotation group SO(d) has many applications. Synchroniza- 
tion over SO (2) plays a major role in the framework of angular embedding for ranking 
and for image reconstruction from pairwise intensity differences [381 139] and for a cer- 
tain algorithm for sensor network localization [7 . Synchronization over SO (3) is in- 
voked by many algorithms for structure from motion in computer vision 19.135, 12|l2j. 
by algorithms for global alignment of 3-D scans in computer graphics [35] , and by al- 
gorithms for finding 3-D structures of molecules using NMR spectroscopy [5] and 
cryo-electron microscopy [Ml [3U] • A closely related problem in terms of applications 
and methods is the synchronization over the orthogonal group 0(d), where the require- 



ment of positive determinant in (1.1) is alleviated. We remark that the algorithms 
and analysis presented in this paper follow seamlessly to the case of 0(d). We choose 
to focus on SO(d) only because this group is encountered more often in applications. 

If the measurements Rij are noiseless and the graph Q is connected then the 
synchronization problem can be easily solved by considering a spanning tree in Q, 
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setting the rotation of the root node arbitrarily, and determining all other rotations by 
traversing the tree while sequentially multiplying the rotation ratios. The rotations 
obtained in this manner are uniquely determined up to a global rotation, which is 
the intrinsic degree of freedom of the synchronization problem. However, when the 
ratio measurements are corrupted by noise, the spanning tree method suffers from 
accumulation of errors. Estimation methods that use all available ratio measurements 
and exploit the redundancy of graph cycles are expected to perform better. 

Methods that are based on least squares have been proposed and analyzed in 
the literature. While the resulting problem is non-convex, the solution to the least 
squares problem is approximated by either a spectral relaxation (i.e., using leading 
eigenvectors) or by semidefmite programming (see, e.g., [57J HH1 I3H1 E])- Typically 
in applications, the ratio measurements generally consist of noisy inlicrs, which are 
explained well by the rotations R\ 1 . . . , R n , along with outliers, that have no struc- 
ture. The least squares method is sensitive to these outliers. In this paper we propose 
to estimate the rotations by minimizing a different, more robust self consistency er- 
ror, which is the sum of unsquared residuals [231 1321 HZ], rather than the sum of 
squared residuals. The minimization problem is semidefmite relaxed and solved by 
the alternating direction method. Moreover, we prove that under some conditions the 
rotations Ri, . . . , R n can be exactly and stably recovered (up to a global rotation), see 
Theorems |4 . 1 1 and |5 . 1 1 Our numerical experiments demonstrate that the new method 
significantly improves the estimation of rotations, and in particular, achieving state- 
of-the-art results. 

The paper is organized as follows: In Section [2] we review existing SDP and 
spectral relaxation methods for approximating the least squares solution. In Section 
[3] we derive the more robust least unsquared deviation (LUD) cost function and its 
convex relaxation. In Section[4]we introduce the noise model and prove conditions for 
exact recovery by the LUD method. In Section [5] we prove that the recovery of the 
rotations is stable to noise. The results of numerical experiments on both synthetic 
data as well as for global alignment of 3D scans are reported in Section [7J Finally, 
Section [8] is a summary. 

2. Approximating the Least Squares Solution. In this section we overview 
existing relaxation methods that attempt to approximate the least squares solution. 
The least squares solution to synchronization is the set of rotations R\ , . . . , R n in 
SO(d) that minimize the sum of squared deviations 

min V WiiW^Rj-Riif, (2.1) 

Ri,...,R n £SO(d) JU 

(i,i)ee 

where || • || denotes the Frobenius norrrj^j toy are non- negative weights that reflect 
the measurement precision^] and Rij are the noisy measurements. The feasible set 



SO(d) n = SO(d) x ••• x SO(d) of the minimization problem (2.1 1 is however non- 
convex. Convex relaxations of ( |2.1[ ) involving semidefmite programming (SDP) and 
spectral methods have been previously proposed and analyzed. 

2.1. Semidefmite Programming Relaxation. Convex relaxation using SDP 
was introduced in [37] for 5(9(2) and in [2] for SO(3) and is easily generalized for any 



lr The Frobenius norm of an m X n matrix A is defined as \\A\\ = ^/y"]^^ YVj—i ^jy 
2 If all measurements have the same precision, then the weights take the simple form Wij = 1 for 
(i, 3) £ S and uiij = otherwise. 
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d. The method draws similarities with the Goemans and Williamson approximation 
algorithm to Max- Cut that uses SDP [IT] . 

The first step of the relaxation involves the observation that the least squares 



problem (2.1| is equivalent to the maximization problem 



max V WijTrtR^RiRfi), 

..,R n £SO(d) ^ %V 



(2.2) 



due to the fact that \\R\ 1 R j \\ 2 = \\Rij\\ 2 = d. 

The second step of the relaxation introduces the matrix G of size n x n whose 
entries 



Gij — Rj Rj 



(2.3) 



are themselves matrices of size d x d, so that the overall size of G is nd x nd. The 
matrix G admits the decomposition 

G = R? R, 



where R is a matrix of size d x nd given by 

R = [ Ri R 2 



Rn 



(2.4) 



(2.5) 



The objective function in (2.2) can be written as Tr (GC), where the entries of C are 
given by CV,- = WijR'L (notice that C is symmetric, since Rfj = Rji and my = Wji). 
The matrix G has the following properties: 

1. G >z 0, i.e., it is positive semidefinite (PSD). 

2. Gu = Id for i = 1, . . . ,n. 

3. rank(G) = d. 

4. det(Gy) = 1 for i, j = 1, . . . , n. 

Relaxing properties ^ and Q leads to the following SDP: 



maxTr(GC) 



(2.6) 



s.t. G h 



for i 



1,2,. 



Notice that for tf = 1 (2.6 1 reduces to the SDP for Max-Cut [TT]. Indeed, synchro- 
nization over O(l) = Z 2 is equivalent to Max-Cut. [^] 

The third and last step of the relaxation involves the rounding procedure. The 
rotations R\, . . . , R n need to be obtained from the solution G to (2.6 1. The rounding 



procedure can be either random or deterministic. In the random procedure, a matrix 
Q of size nd x d is sampled from the uniform distribution over matrices with d or- 
thonormal columns in R nd (see 20 for description of the sampling procedure) . Then, 
the product GQ of size nd x d is computed, and viewed as n matrices of size d x d, 
denoted by (GQ)i, . . . , (GQ) n . The i'th rotation i?j is then obtained via the singular 
value decomposition (SVD) (equivalently, via the polar decomposition) of [GQ)i as 
(see, e.g., dHHJUS]) 



{GQ), = UiZiV? 



J = 



h-i 
det U t VT 



Ri = UiJVS 



(2.7) 



3 The case of SO(2) is special in the sense that it is possible to represent group elements as 
complex-valued numbers, rendering G a complex-valued Hermitian positive semidefinite matrix of 
size n X n (instead of a 2n X 2n real valued positive semidefinite matrix). 
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(the only difference for synchronization over O(d) is that Ri = UiV^ ', excluding any 
usage of J). In the deterministic procedure, the top d eigenvectors u 1; . . . , u d G W ld of 
G corresponding to its d largest eigenvalues are computed. A matrix T of size nd x d 
whose columns are the eigenvectors is then formed, i.e., T = [ v\ ■■■ Vd ] . As 
before, the matrix T is viewed as n matrices of size d x d, denoted T±, . . . ,T n and the 



SVD procedure detailed in (2.7) is applied to each 1} (instead of (GQ)i) to obtain Ri 



2.2. Spectral Relaxations. Spectral relaxations for approximating the least 
squares solution have been previously considered in pZTJ HI EM [Ml HS1 13 EH E23 • All 
methods are based on eigenvectors of the graph connection Laplacian (a notion that 
was introduced in 29J) or one of its normalizations. The graph connection Laplacian, 
denoted L\ is constructed as follows: define the symmetric matrix W\ £ K" dxnd j such 
that the (i, j)'th dxd block is given by (Wi)ij = w l3 R ir Also, let D x € R ndxnd 7 be 
a diagonal matrix such that (Di)u = dild, where di — Yl j Wij. The graph connection 
Laplacian L\ is defined as 

L 1 =D 1 -W 1 . (2.8) 

It can be verified that L\ is PSD, and that in the noiseless case L\R T = 0. The least 
d eigenvectors of L% corresponding to its smallest eigenvalues, followed by the SVD 



procedure for rounding (2.7) can be used to recover the rotations. A slightly modified 



procedure that uses the eigenvectors of the normalized graph connection Laplacian 
A = D- 1 ' 2 ^- 112 = I nd - D- l/2 WxDl 1/2 (2.9) 



is analyzed in [S]. Specifically, Theorem 10 in [5J bounds the least squares cost ( |2.1[ ) 
incurred by this approximate solution from above and below in terms of the eigen- 
values of the normalized graph connection Laplacian and the second eigenvalue of 
the normalized Laplacian (the latter reflecting the fact that synchronization is easier 
on well-connected graphs, or equivalently, more difficult on graphs with bottlenecks). 
This generalizes a previous result obtained by [34] that considers a spectral relaxation 
algorithm for Max-Cut that achieves a non-trivial approximation ratio. 

3. Least Unsquared Deviation (LUD) and Semidefinite Relaxation. As 

mentioned earlier, the least squares approach may not be optimal when a large propor- 
tion of the measurements are outliers [331 E21 HZ] • To guard the orientation estimation 



from outliers, we replace the sum of squared residuals in (2.1) with the more robust 
sum of unsquared residuals 

min V WR^Ri-RuW, (3.1) 

K 1 ,...,fl„£SO(d) V 



to which we refer as LUD The self consistency error given in (3.1) mitigates the 



contribution from large residuals that may result from outliers. However, the prob- 



lem (3.1) is non-convex and therefore extremely difficult to solve if one requires the 



matrices Ri to be rotations, that is, when adding the orthogonality and determinant 



constraints of SO{d) given in (1.1) 



Notice that the cost function (3.1) can be rewritten using the Gram matrix G 



that was defined earlier in (2.3) for the SDP relaxation. Indeed, the optimization 



4 For simplicity we consider the case where Wij = 1 for £ S. In general, one may consider 

the minimization of Ylu j)e£ W V ^"3 ~ ' 



(3.1) is equivalent to 



mm 

■ ,R n eSO(d) 



E WGi 



Ri 



(3.2) 



Relaxing the non-convex rank and determinant constraints as in the SDP relaxation 



leads to the following natural convex relaxation of the optimization problem (3.1 1 : 



mm 

G 



E \\G. 



RiA\ s.t. Gi 



I d , and G >= 0. 



(3.3) 



(i,j')ef 



This type of relaxation is often referred to as semidefmite relaxation (SDR) [3"T] US]. 
Once G is found, cither the deterministic or random procedures for rounding can be 
used to determine the rotations R%, . . . , R n . 

4. Exact Recovery of the Gram matrix G. Consider the Gram matrix G 
as G — (Gij) i =1 n where = RfRj is obtained by solving the minimization 



problem (3.3). We will show that for a certain probabilistic measurement model, the 
Gram matrix G is exactly recovered with high probability (w.h.p.). Specifically, in 
our model, the measurements Rij are given by 



Rij - S, , H, Rj 



(l-Sij)Ri. 



(4.1) 



for (i,j) £ £, where 5^ are i.i.d indicator Bernoulli random variables with prob- 



ability p (i.e. 



1 with probability p and Sij = with probability 1 — p). 



The incorrect measurements Rij are i.i.d samples from the uniform (Haar) distri- 
bution over the group SO(d). Assume all pairwise ratios are measured, that is, 
£ = \h 3 ' = 1 • • • Let £ c denote the index set of correctly measured rota- 

tion ratios Rij, that is, £ c = \Rij = RfRj}. In fact, £ c is the edge set of a 

realization of a random graph drawn from the Erdos-Renyi model Q(n,p). In the re- 
mainder of this section we shall prove the existence of of a critical probability, denoted 



p* (d) , such that for all p > p*(d) the program ( 3.3 ) recovers G from the measurements 
(4.1 ) w.h.p. (that tends to 1 as n — > oo). In addition, we give an explicit upper bound 
p c (d) for p* c (d). 

Theorem 4.1. Assume that all pairwise ratio measurements Rij are generated 
according to \4- ty - Then there exists a critical probability p* c (d) such that when p > 
Pc(d-), the Gram matrix G is exactly recovered by the solution to the optimization 
problem (3.3) w.h.p. (as n — > oo). Moreover, an upper bound p c (d) for p* c (d) is 



Pc(d) = 1 - 



' '-Cl (d) + Jc~ x (d) + 4 (c (d) + 1/y/d) /y/d 



\ 



V 



(c (d) + l/Vd 



(4.2) 



where c{d) and c\(d) are constants defined as 



c(d) = -E 
d 



Tr 



id: 



R 



(4.3) 



b The measurements in 1 4.1 1 satisfy Rij 



c^^f^f 2 "- (4.4) 

In particular, 

Vc (2) « 0.6237, p c (3) w 0.6966, and p c (4) « 0.7454 

/or SO (2), SO (3) and SO (4) respectively. 

Remark. The case d = 1 is trivial since the special orthogonal group contains 
only one element in this case, but is not trivial for the orthogonal group O(l). In fact, 
the latter is equivalent to Max-Cut. Our proof below implies that p c {l) = 0, that is, 
if the proportion of correct measurements is strictly greater than 1/2, then all signs 
are recovered exactly w.h.p as n — > oo. In fact, the proof below shows that w.h.p the 
signs are recovered exactly if the bias of the proportion of good measurements is as 

small as O 



Proof. Since the optimization problem (3.3) is convex, it suffices to show that 
w.h.p. (over the measurements), the correct solution is a local minimum of the ob- 
jective function. 

We shall analyze the local perturbation of the objective function directly. The 
proof proceeds in two steps. First, we show that without loss of generality we can 
assume that the correct solution is Ri — Id and Gij = Id for all 1 < i,j ' < n. Then, 
we consider the projection of the perturbation into three different subspacesj^] Using 
the fact that the diagonal blocks of the perturbation matrix must be 0, it is possible 
to show that when the perturbation reduces the objective function for indices in £\£ c 
(that is, the incorrect measurements), it has to increase the objective function for 
indices in £ c . If the success probability p is large enough, then w.h.p. the amount of 
increase (to which we later refer as the "loss" ) is greater than the amount of decrease 
(to which we later refer as the "gain"), therefore the correct solution must be the 



solution of the convex optimization problem (3.3). 



4.1. Fixing the correct solution. Lemma 4.2. If the optimal solution of 



(3.3) is w.h.p. Gij — Id when Ri = Id, then the optimal solution of \3.3) is w.h.p. 
Gij = RfRj for arbitrary rotations {Ri}. 

Proof. We give a bijection that preserves feasibility and objective value between 



the feasible solutions to (3.3) when Ri = Id and the solution for general Ri 



In fact, given any feasible solution G for general Ri, let 

G = diag (i?i, . . . , Rn) Gdiag (R 1: . . . , R n ) T . 

that is, G^ — RiG^Rj . If we rotate R^ similarly to get R\j — RiR^Rj , then 
Ri j — I. Since G is the solution of (3.3), we know G must be a solution to the 



following convex program with the same objective value 

s.t. G u = I d , and G fc= 0. (4.5) 



mm | Gjj — Rij 



However, observe that for edges in £ c , RiRijRj — Id; for edges not in £ c , RiR^Rj 
is still a uniformly distributed random rotation in SO(d) (due to left and right invari- 



ance of the Haar measure). Therefore, (4.5) is equivalent to (3.3) when Ri = Id- 



6 The method of decomposing the perturbation was used in |17] for analyzing the performance of 
a convex relaxation method for robust principal component analysis. 
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The other direction can be proved identically. □ 

Using the Lemma |4.2| we can assume without loss of generality that Ri = Id for 
all 1 < i < n. Now the correct solution should be G%j — Id- We denote this solution 
by G, and consider a perturbation G + A. 

4.2. Decomposing the perturbation. Let G + A be a perturbation of G such 
that An = and G + A ^ 0. Let S be the d-dimensional linear subspace of M. dn 
spanned by the vectors si, . . . , Sd, given by 

si = (ei,ei, . . . ,ei) T , 
s 2 = (e 2 ,e 2 , . . . ,e 2 ) T , 



Sd = (e d ,e d , . . . ,e d ) T , (4.6) 
where e m (m = 1, 2, . . . , d) is the <i-dimensional row vector 

6m (v 



1 if Z = m, 
otherwise, 



that is, ei = (1, 0, . . . , 0), e 2 = (0, 1, 0, . . . , 0), . . . and e d = (0, . . . , 0, 1). 

Intuitively, if a vector v is in the space S, then the restrictions of v on blocks 
of size d are all the same (i.e. d = v d+i....,2d = ••• = v ( n —\)d+\,...,nd) • On the 

other hand, if a vector v is in S, the orthogonal compliment of S, then it satisfies 

J2i=l v (i-l)d+l,...,id = o. 

For two linear subspaces A, B, we use A ® B to denote the space spanned by 
vectors {a ® b|a 6 ^4 and b e B}. By properties of tensor operation, the dimension 
of A <£) B equals the dimension of A times the dimension of B. We also identify the 
tensor product of two vectors a® b with the (rank 1) matrix ab T . 

Now let P, Q and T be A's projections to S ® <S, (S ® 5) U (5 ® S) , and 5 ® S 
respectively, that is, 

A = P + Q + T, where P G 5 ® 5, Q G (5 ® <S) U («S ® S) , and T G S ® 5. 

Using the definition of 5, it is easy to verify that 

Pij=P n Vl<z,j<n, (4.7) 

Q = Q 1 +Q 2 , (4.8) 
where Q 1 G 5 <g> 5, Q 2 G S ® S, Q 1 = (Q 2 ) T , 

Q\j=Qlj = ■■■ = Q 1 n j^3 (4-9) 



and 



Moreover, we have 

E Q« = o, E^i = ° and E ^ = °- ( 4 - n ) 



For the matrix T, the following notation is used to refer to its submatrices 



<■> ^D,, , r -i /- (v;/) w , • 

where are T's d x d sub-blocks and T pq are T's n x n sub-matrices whose entries 
are the (p, g) 'th elements of the sub-blocks Tij . 

Intuitively, if the objective value of G + A is smaller than the objective value of 
G, then A^- should be close to for the correct measurements € £ c , and is large 
on such that £ £ c . 

We shall later show that the "gain" from £ c can be upperbounded by the 

trace of T and the off-diagonal entries of Q\j . Then we shall show that when the trace 
of T is large, the diagonal entries of A^ for G £ c are large, therefore the "gain" 
is smaller than the "loss" generated by these diagonal entries. On the other hand, 
when the off-diagonal entries of Qjj are large, then the off-diagonal entries of A^- for 
G £ c are large, once again the "gain" will be smaller than the loss generated by 
the off-diagonal entries. 

4.3. Observations on P, Q and T. To bound the "gain" and "loss", we need 
the following set of observations. 
Lemma 4.3. nP n = -J^=i T u- 

Proof. Using (|4.7[)-( |4.1l] | and the fact that A u = 0, we have 

o = a« = ]T ( p " + Qlt + Q« + T «) 

i i 

= np n + Qu +E^i+E T » = nP " + E T »- 

i i i i 

□ 

Lemma 4.4. (P y + Q y ) + (P yi + Q 3l ) = - (T u + Tjj) . 
Proof. We use the symmetry of the matrices P, Q: 

(Pij + Qij) + (Pji + Qji) 
=Pu + Qlj + Qij + Pjj + Q], + Q% 

Pii + Qjj Qii Pjj Qii ~t~ Qjj 

= (/>, + Qii + Ql) + (P n + Q), + Q%) 
= (Aii — Tu) + (Ajj — Tjj) 
= ~ (Tu + Tjj) , 



where the second equality follows (4.9 )-(4.10 ). □ 
Lemma 4.5. T ^ 0. 

Proof. Since G + A = G + P + Q + T ^ 0, for any vector v G S, v T (G + A)v > 0. 
However, v T Gv = v T Pv = v T Qv = according to the definition of G, P and Q. 
Therefore, for all v G 5 we have v T Tv > 0. Also, Tw = for w G 5. Therefore, 
if v G S and w G 5 then (v + w) T T(v + w) = v T Tv > 0. Hence, T is positive 
semidefinite. □ 

Lemma 4.6. Le£ be a diagonal matrix whose diagonal entries are those ofTij, 

then WZiTg > Tr(T) / \fd. 

Proof. This is just a straight forward application of the CauchySchwarz inequality. 
Denote X = J^i Tu- Clearly, X fc= , since it is a sum of positive semidefinite matrices. 



Let xi,. . . ,Xd be the diagonal entries of X. Then, from CauchySchwarz inequality, 
we have 



Tr(X) = 5>< 



/ d = Vd\\X\ 



that is, ||E<2«|| = 11*11 > Tr (X) / \fd = Tr (T) / \[d. □ 

Lemma 4.7. Let A be a n x n adjacency matrix such that A^ = 1 if and only 
if (hj) G £c> otherwise, Aij — 0. Let B = A — ^ (l T ^l) H T ; 1; where 1 is i/ie 
all-ones (column) vector. Let A = ||B||2, where \\ ■ || 2 denotes the spectral norm of a 
matrix. Then, 

X = P (Vn) • 

Here the notation f = Op(g) means f is upper bounded by c p ■ g with high proba- 
bility, where c p is a constant that may depend on p. 

Proof. Let B x = A - pll T . Observe that B x is a random matrix where each 
off-diagonal entry is either (1 — p) with probability p or —p with probability (f — p). 
Therefore, by Wigner's semi-circle law and the concentration of the eigenvalues of 
random symmetric matrices with i.i.d entries of absolute value at most 1 [T], the 
largest eigenvalue (in absolute value) of B\ is ||Bi|| 2 = Cp (v^)- Then we have 



151 



< 



Bi 
Si | 



{\ T A\) 11 



i\ r A\\ 



T1 J 



<P*i|_ 
= Op (yfn) 



where the second inequality uses the Chernoff bound. □ 

Remark. The matrix A is the adjacency matrix of the Erdos-Renyi (ER) random 
graph model Q(n,p) where the expectation of every node's degree is (n — l)p. Denote 
by Xi the i'th largest eigenvalue of a matrix. It is intuitive to see that Ai (A) w 
(n — l)p and the first eigenvector is approximately the all-ones vector 1, and A 2 (A) rs 
\ 1 (B) = P (y/n). 

Using these observations, we can bound the sum of norm of all 's by the trace 
of T. 

Lemma 4.8. We have the following three inequalities for the matrix T: 



1. 
2. 
3. 



E 



rppq 



< \Tr{T p P)2 Tr{T qq Y for p,q= 1,2,. 
< A Tr (T) , 

< A Tr (T) . 

Here the notation for the inner product of two matrices (X,Y) means Ti(X T Y). 
Proof. 1. Since T 0, we can assume T pq = (u?,u|), where u P (i = 1, . . . ,n, 
p = 1, . . . , d) is an n dimentional row vector, then Tr (T pp ) = Y17=i ( u f> u f)' an< ^ 



t^ = k,...,<)(^®/„)K,...,<) t . 
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We claim that J27=i u ? = ® ^ n f ac ^' smce T G S <g) <S, for any p, q = 1, . . . , d, we have 
E"j=i T |f = s p Ts g = • Therefore, we obtain 



«=E^=E<«> = (E^E^) = 

i,j=i 2,i=i \i=i j=i / 



E u ? 



and thus we have 



£u? = Ofarp=l,...,d. 



(4.12) 



Let v^j (to = 1, . . . , n, p = 1, . . . , d) be a n dimensional row vector such that 



then we have £™=i ||v£j| = Tr (T pp ), and (v^, 1) = due to (4.12). Therefore 



E ^ 



(u p 1 ,...,u p n )(A®I n )(u q 1 ,...,u q n y 



rn — 1 

<E A II V ™INI V ™I 



\ m / \ m 

= ATr (T pp )' Tr (T qq )^ 



where the first inequality uses Lemma 4.7 and the fact that vj^ (to = 1, . . . ,n, p 



1, . . . , d) is orthogonal to the all-ones vector 1, and the second inequality uses Cauchy- 
Schwarz inequality. 



3. From 1. we have 



< ATr (T) is clear from 1. when p = q. 



E r« 




E 


' E 




= \ 







< ^ATr (T») * Tr (T««) * 



p,</ 



= A Tr (Tpp) Tr (T?«) 

= A^Tr(T pp ) 

p 

= ATr (T) . 
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4.4. Bounding the "gain" from incorrect measurements. To make the 
intuition that "gain" from incorrect measurements is always upper bounded by the 
"loss" from correct measurements formal, we shall first focus on (i,j) £ c , and bound 
the "gain" by trace of T and norms of QL. Recall that Gij — Id- Denote the objective 
function by F, that is, 

F(G) = \\Gii-Rij\\- (4-13) 

Then, 

F(G + A)-F(G)= J2 (\\I d -R l3 +^\\-\\I d -R l3 \\) 
(<,i)e£ 

= £ (\\l d -R ij + A iJ \\-\\I d -R ij \\) + £ ||Ay|| 

(i,j)e£\£ c (ij)e£ c 
=:/i+/ 2 . (4.14) 

Here f\ is the "gain" , we shall bound it using the following lemma. 
Lemma 4.9. For any pair of non-zero matrices M\ ,M 2 G R mxm , we have 

||Mi + M 2 || - ||Mi|| > (Mi/ ||Mi|| , M 2 ) . (4.15) 



Proof. Using Cauchy-Schwarz inequality, we obtain 



HMJ 2 + (Mi,M 2 ) = (M 1 ,M 1 + M 2 ) < ||Mi|| ||M X + M 2 || , 

which is equivalent to ( 4.15| ). □ 

We apply Lemma |4.9| to the "gain" f\ and obtain 



h> E 

(i,j)ee\£ c 

= E 

(i,j)££\£c 



Id - R 

\I d — Rij 



13 A-- 



Id — Rij 

| id - i2i.il 



(4.16) 



First we shall bound the "gain" from the matrix P. Since blocks of P are the same, 
the average 



1 



\S\Sc 



E 



(i,j)eS\£ c 



Id — R 



\Id — R 



should be concentrated around the expectation of / ||/^_p' J || > PyVThe expectation is 
analyzed in Appendix |A| By ( 4.7 )-( 4.10) and the law of large numbers, we obtain that 



E 

(i,j)e£\£ 



Id — Ri 



Id — Rij | 

(c (d) (1 -p) n (n - 1) + Op (n0og^) ) Tr (P u ) 
- (c (d) (1 - p) (n - 1) + Op (V^)) Tr (T) , 



(4.17) 
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where the last equality uses Lemma |4.3[ c (d) defined in (4.3 ) and the rotation matrix R 
is uniformly sampled from the group SO (d) (see Appendix [A] for detailed discussion). 
For matrix Q we use similar concentration bounds 



El I d ~ R ij n 
\ 117",- /?„■„■ 1 1 ' J ' 



(i,j)e£\s e 



\\Id 


— RijW 


Id 


— R^ 


\\Id 


— RijW 


Id 


— Rij 


\\Id 


— Rij\\ 



(i,j)ee\s, 



(i,j)££\£c 

> ^ (n - p (n - 1)) (c (d) I d , Q)j) + (n - p (n - 1)) (c (d) J d , ) 

j » 

- Op (Vnlogn) ^ ||Q^ || - O p (^/nlogn) £ ||Q* || 

i i 

= (n - p (n - 1)) ^cl d , £ Qjj ^ + (n - p (n - 1)) ^c/ d , £ Q?^ 

-Op(V^i^)2||Qi,.|| 
= -0 P ( v ^log^)^||Q l 1 J ||, (4.18) 



where the third equality uses the fact that Q 1 — (Q 2 ) T , and the last equality follows 

( FTP - 

Finally we shall bound the "gain" from matrix T, which is 



(i,j)e£\£ C 



Before continuing, we need the following results in Lemma |4.10| for a matrix D, where 
D is defined as 

1 otherwise. 

The proof of Lemma |4.10| is in Appendix [B] 

Lemma 4.10. The limiting spectral density of ^-L— J) is Wigner's semi-circle. 
In addition, the matrix D can be decomposed as D = D + + £>_ , where D + )p and 
D- < ; and we have 

WD+W 2 «\\D_\f^^(l-p)n(n-l)(l-c (df d) . (4.21) 
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We return to lower bounding (4.19). Since J^ij Tij = 0, we have 
Id — Ri 



E 



(i,j)ee\e c 



\\Id — Ri, 



T 



Id — Ri 



= (c(d)i d ,r«)+ £ ( II? -7 II - c ( rf ) J ^ 

(i,j)£E\S c (i,3)e£\£ c X " d 

= - c (d) £ (J* T,,-) - c (d) (h, Tu)+J2 ( D V > T ^ 

(i,j)e£c « id 

= -c(d) (42ij>-c(d)Tr(T) + Tr(OT) 

= -c(d) £ (7 d ,T^}- C (d)Tr(T) + 1V(£) + T)+Tr(Z)_T) 

(i,j)6£c 

>-c(d) £ (/ d ,T„-) -c(d)Tr(T)+Tr(.D_T), 

(i,j)e£c 



(4.22) 



where the last inequality follows from the fact that Tr (D+T) > since both T and 
.D+ are positive semidefinite matrices. Recall that for any positive semidefinite matrix 



X the following inequality holds: \\X\\ < Tr (X). Since T !>= 0, using (4.21 1 we obtain 
|Trp_T)| < ||£>_||||T|| < || Tr (T) « -^v^T^) ^ 1 - c (d) 2 dnlt (T) . (4.23) 
Also, Lemma 14.81 reads 



E Ti -? 

\ (i,i)ef / 



< ATr (T) . 



(4.24) 



Combining ( |4.22[ ), ( |4.23[ ), and ( |4.24[ ) together gives 



E 

(ij)e£\£ c 



X ■ 

\Id — Rij\\ 



(c (d)X + c (d) + ~ ^(i - p ) .^/i - c (d) 2 dn^) Tr (T) . (4.25) 



> 

Since from Lemma |4.7| we have A = Op (y^)) the bound is given by 

— R4 



E 

(ij)G£\£ c 



y /Ti 

! J-lj 



> 



(-L^/ll^^l-cidfdn + Op (Vn)) Tr (T) 



(4.26) 



Combining ( 4.16[ ), (4.171, (4.181 and (4.26) together we obtain a lower bound for 
the gain from A as following: 

fi>- ^^0~~p)^l-c(dfdn + c (d) (1 - p) n + Op (v^)) Tr (T) 

-0 P (v^bg^) £ HQlill • (4.27) 
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4.5. Bounding the "loss" from correct measurements. Now we consider 
the part f 2 = j)ee ll^y'll' which is the loss from the correct entries. We use the 
notations A^ and A°? to represent the restrictions of the sub-matrices A,-j on the 
diagonal entries and off-diagonal entries respectively. We will analyze j)e£ W^ij II 

and £(i,j)e£c ll A °f II separately. 

For the diagonal entries we have 



E Kl 



E \\Pti + Q% + T i 



> 



> 



(i,j)e£c 



E ?$ + Qb 

(i,j)e£c 

E T " + T n 

(ij)e£c 





e n 




(i,j)e£a 




E n 




(i,j)e£c 



= pn 



E^ 



— Op ( \/n log 7i ) Tr (T) 



E * 



> -^Tr (T) - Op (y/nlognj Tr (T) - ATr (T) 



-Op(^logn)^Tr(T), (4.28) 
where the second equality uses Lemma \4A\ the third equality uses the law of large 



pn 



numbers and Chernoff bound, the last inequality follows Lemma 4.6 and Lemma |4.8| 
and the last equality uses the fact that A = Opi^fn) from Lemma 4.7 

For the off-diagonal entries, we have the following lemma, whose proof is deferred 
to Appendix [C] 

Lemma 4.11. If the proportion of edges £ £ c is greater than „, that is, 

p> \, then 



E A 

(i,j)e£e 



> 



d 2 



Op(n)J2\\Qu\\-Op (n)Tr(T) 



(4.29) 



Finally, we can merge the loss in two parts by a simple inequality: 
Lemma 4.12. If y\ |oi| > a and \bi\ > b, then for any a, f3 > such that 
a 2 + (3 2 — 1, we have 



E \/ a i +b i ^ aa + fib. 



Applying Lemma |4.12| to (|4.28|) and (|4.29| , and setting a = 1 — -7= and /3 
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-?= — hi we obtain a lower bound for the loss from A when p > i: 




(4.30) 



4.6. Finishing the proof. Now that we have bounded the "gain" and the 
"loss" , we just need a condition on p such that the "loss" is greater than the "gain" 
(w.h.p.). Combining (4.14), (4.271 and (4.30) together, when p > 1/2 we obtain 



F(G + A) — F(G) = fx + h 
> ( (p/Vd - ~ ^l^p)^/l-c(dj 2 d - c (d) (1 - p)) 



Of 



n 1 



Tr(T) 
(4.31) 



that is, when the number of rotations n is large enough, we just need 

2 

-ex (d) + \ jc 1 (d) 



yjc! (d) + 4 (c (d) + 



V 



2 (c(d) + 



/ 



where c±(d) is defined as (4.4) □ 



5. Stability of LUD. In this section we will analyze the behavior of LUD when 
the measurements Rij on the "good" edge set £ c are no longer the true rotation r atio s 
RjRj, instead, Rij are small perturbations of RfRj- Similar to the noise model (4.1 ), 
we assume in our model that the measurements Rij are given by 



R; 



$ij Fi^ij 



(1 



(5.1) 



where the rotation Rij is sampled from a probability distribution (e.g. the von Mises- 
Fishcr distribution 0; c.f. Section 7.2) such that 

E(\\Rij - R?Rj\\) =e, and Var(p„- - Rf R 3 1|) = O (e 2 ) . (5.2) 

We can generalize the analysis for exact recovery to this new noise model ( |5.1[ ) 
with small perturbations on the "good" edge set and prove the following theorem. 

Theorem 5.1. Assume that all pairwise ratio measurements Rij are generated 
according to (5.1) such that the condition (5.2) holds for a fixed small e > 0. Then 
there exists a critical probability p*(d) such that when p > p*(d), the solution G to the 
optimization problem (3.S) is close to the true Gram matrix G in the sense that 

2 



G-G 



< O (n 2 ) 



w.h.p. (as n — > oo). Moreover, an upper bound p c (d) for p*(d) is given by (4-2). 
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Proof. First, the "gain" fx from the incorrect measurements remains the same, 
since the noise model for the "bad" edge set £\£ c is not changed. Thus, the lower 
bound for fx is given in ( |4.27| . For the "loss" from the good measurements, we have 

/ 2 = E (11^-^+^11-11^-^11) 

> e (\\*ij\\-nid-Rii\\) 

> E l|A u ||-2p(n 2 + P (nyroi^))e. (5.3) 



Applying Lemma 4.12 to (4.28) and (4.291, and setting a = y/1 — e and j3 = y^io 
where eo <C p — p c , we obtain 



E \\^\\>vr^~o E \\ A U+v^ E \\ A °P 

(i,j)6£c 



n 

> c 2 nTr (T) + c 3 n^ ||Q^|| - O p Unlogn) Tr (T) , (5.4) 



i=l 



where c 2 and C3 are some constants. Thus, combining (4.271, (5.3) and (5.4) together, 
we get 

F(G + A) — F(G) 
= h + h 

n 

> c 4 nTr(T) + c 3 nJ2\\Qu\\-°P (V nl °gn) Tr(T) 

i=l 
n 

-Op (V^logn) E ll^«ll - 2 *> ( n2 +°P ("Vlog^)) e, (5-5) 



where C4 is some constant. Thus, if the RHS of (5.5) is greater than zero, then G + A 
is not the minimizer of F. In other words, if G + A is the minimizer of F, then the 
RHS of (5.5) is not greater than zero. Let n — > 00 in (5.5), we obtain the necessary 
condition for the minimizer G + A of F: 

71 

c 4 Tr(T)+c 3 J2\\Qu\\ < 2 P n ^ ( 5 - 6 ) 



To show that condition (5.6) leads to the conclusion that the amount of perturbation 

II A II 2 



i^n 2 + iiQii 2 + imi 2 



(5.7) 



is less than O (n 2 ) e 2 , we need the following lemmas to upper bound ||A|| 2 by parts. 
Lemma 5.2. 



\P\\ < Tr(T) 2 < O in 2 ) e 2 , and ||T|| < Tr(T) < O (n 2 ) e 2 



(5.8) 
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Proof. Using the fact that for any positive semidefinite matrix M, \\M\\ < Tr (M), 
we have 



|P|| 2 = n 2 ||Pu|| 2 = 



E T « 



<Tr l^Ta) = Tr (T) 2 



(5.9) 



\T\\ 2 < Tr (T) 2 . 



(5.10) 



And following (5.6) we obtain (5.8). □ 



Lemma 5.3. 



IIQH 2 < 27V(T) 2 - 2Tr(T) + nd. 



(5.11) 



T 



Proof. Since G + A ^ 0, we can decompose G + AasG + A — J27=i AiV^v, 
where the eigenvectors £ M. nd and the associated eigenvalues A^ > for all i. Then 
we further decompose each Vj as Vj = vf + vf , where vf £ S and vf s 5. Thus we 
obtain 



n (i 



nd 



G + A = Y1 = £ A, (vf + vf ) (vf + vf ) (5.12) 

nd nd rp nd rp nd 

E A * v f( v f) T +E A * v f( v f) +E A » v f( v f) +E A * v f( v f) T - 



i=l 



On the other hand, we can decompose G + A as 



G + A = (G + P) + Q i + Q 2 +T 



(5.13) 



where G+Pe5«5, Q 1 £ S ®S, Q 2 £ S®S and T £ S ®S. By comparing the 



right hand sides of (5.12) and (5.13), we conclude that 



G + P = ^A,vf(vf) T , T = J>vf(vf) , 

i=l i=l 
nd rp nd 

Qi = ^A iV f(vf) , Q 2 = ]>>vf (vff^Q 1 ) 
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Therefore, we have 



|Q|| 2 = 2||Q 1 | 



TIU r 

E^ v ?( v f) 



nd 



i=l 
nd 



2Tr &vf(v?)t^ V 



2^A 1 A J (vf) T vf (vf) vf 



nd 

< ^ 

mi 



vf) vf(vf) T v 



i i=i 



/ no, rp na 



i=l 



= iiG+pir + iiTir. 

Using the fact that G = J n( j and n?n = — T^, we obtain 

\\G + P\\ 2 = \\G\\ 2 + \\P\\ 2 + 2Tt(G,P) 
= nd+ \\P\\ 2 - 2Tr(T) . 



(5.14) 



(5.15) 



Combining Q, Q, ( [5715] ) an d jOg ), we get ( [sTTI] ). □ 

Using Lemma 5.2 and Lem ma |5.3| we reach the conclusion in the theorem. □ 

\Q\\ < O (n 2 ) e 2 holds when e ^> which leads 



Remark. Using Lemma 



5.3 



to the weak stability result of LUD stated in Theorem 1 5 . 1 1 1 hat requires e to be fixed. 
In fact, a stronger stability result of LUD that allows e — > as n — > oo can also been 
proven using ideas similar to the proof of Theorem |4.1| The proof of strong stability 
is very technical and is omitted from this paper. Instead, it can be found in the 
dissertation of the first author. 

6. Alternating Direction Augmented Lagrangian method (ADM). Here 
we briefly describe the ADM |37j to solve the non-smooth minimization problem (3.3 ). 
ADM is a multiple-splitting algorithm that minimizes the dual augmented Lagrangian 
function sequentially regarding the Lagrange multipliers, then the dual slack variables, 
and finally the primal variables in each step. In addition, in the minimization over a 
certain variable, the other variables are kept fixed. The optimization problem (3.3 1 
can be written as 



4£uEii^-. 

i<3 



where the operator A 



pndxnd 



A(G) 



pnd 



s.t. A{G) = b, Xij = R4 
l nd2 is defined as 

)»=1 n,p,q=l,...d 



(X (p=q) (p, '/:),,, 
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Gij, 



(6.1) 



Since 



max min £(||Xy|| - (%, Xy - + G tj )) - (y, 4(G) - b) - (G,W) 



«*■ max min - (Q (6») + + X (y) , G) + yb 2 

Sij ,y,W^O Xij,G 



where 



/ o 



Q{0) 



9 T 

y 12 



"12 





V C 61 



T 
2m 



£(11* 

i<j 



'lm 
92m 



(6.2) 



We want to first minimize the function over Xy and G in (6.2 1. The rearrangement of 
terms in (6.2 1 enable us to minimize — (Q (9) + W + A* (y) , G) over G and minimize 
\\Xij || - Xy) over Xy, i < j separately. To minimize — (Q(8) + W + A* (y) , G) 



over G, the optimum value will be — oo if Q (9) + W + A* (y) ^ 0. Therefore due to 
the dual feasibility Q (9) + W + A* (y) = and the optimum value is zero. Since 



= 11^11-11^11 ||M (%/||%||,^/||x y -| 

= j|X 13 ||(l-||^||(%/||%||,X tJ /||X iJ ||)) 

>||Xy||(i- PvW), 



(6.3) 
(6.4) 



to minimize IIXij|| — (9ij> Xij) over Xij, if ||0y|| > 1, then let Xij = o&ij, a > and 
($ij, X^) — a \\0ij\\ (1 — ||%||) goes to — oo if a goes to +oo. 
< 1 and from (6.4 1 we get the optimum value is zero where Xjj = 0. 



then from (6.3| ||Xy 
Hence 



>ij 

Therefore the dual problem is 

min -yb T - V (%, R l3 ) s.t. ||%|| <1, Q (9) + W + A* (y) = 0. 



(6.5) 



The augmented Lagrangian function of the dual problem (6.5) is 



C (y, 6, W, G) = -yb T - ]T R l3 ) + (Q(8) + W + A* (y) , G) (6.6) 



i<j 



+%\\Q(0) + W + A*(y)f F , s.t. 



Ml <1, 

where /i > is a penalty parameter. Then we can devise an alternating direction 
method (ADM) that minimizes (6.6) with respect to y, 9,W, and G in an alternat- 
ing fashion, that is, given some initial guess y ,^ ,!^ , and G°, the simplest ADM 
method solves the following three subproblems sequentially in each iteration: 



Jfc+l 



Qk+1 



axgmhiC{y,6 k ,W k ,G k ) 
y 



= arg mm 



C(y k+1 ,9,W k ,G k ), s.t. <1 



W k+1 = arg min C (y k+1 , 9 k+1 , W, G k ) 
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(6.7) 
(6.8) 
(6.9) 



and updates the Lagrange multiplier G by 



G k 



fc+l _ 



G k + (Q (0 k+1 ) + W k+l + A* (y fc+1 )) 



(6.10) 



where 7 e ^0, 1+ 9 V ^ ^ is an appropriately chosen step length. 
To solve (6.7), set V y £ = and using AA* = /, we obtain 



r k+l 



A (Q (6 k 



W 



(A(G k 



By rearrangement of terms of £, it is easy to see problem (6.8 1 is equivalent to 

/' 



%j, Rij) 



1^-^11^,^.11^11 < 1 



where <& = W k + A* (y fe+1 ) + ^G fe . And it can be further simplified as 

min 

On 



min<%,^ - Rij) + | ||%H , s.t. ||%|| < 1 



whose solution is 



^Rij-Qij if 



< 1, 



otherwise., 



Problem (6.9) is equivalent to 



mm\\W - H k \\ F , s.t. W V 0, 

where i? fc = -Q (6> fc+1 ) - A* (y fc+1 ) - ^G fc . Hence we obtain the solution W k+1 
V + E + VT, where 



\Tr = n: j: i ( E + E °_ 



is the spectral decomposition of the matrix H k , 
negative eigenvalues of H k . 



and and £_ are the postive and 



Following (6.10), we have 

G k+1 = (1 - 7) G k + 7/x (iy fcj 



The convergence analysis and the practical issues related to how to take advan- 
tage of low-rank assumption of G in the eigenvalue decomposition performed at each 
iteration, strategies for adjusting the penalty parameter fj,, the use of a step size 
7 for updating the primal variable X and termination rules using the in-feasibility 
measures are discussed in details in [37]. The time complexity of ADM for LUD is 
0(n 3 /6), where 6 is the accuracy of ADM. Since in each iteration, the most expensive 
process is the Arnoldi iterations [TH] to compute the eigen-decomposition of W +1 
whose cost is C(n 3 ), and according to the convergence rate analysis of ADM in [13], 
we need 0(1/ 5) iterations to reach a 5 accuracy. Note that the time cost 0(n 3 ) of an 
eigen-decompostion can be reduced if only a few largest eigenvalues/eigenvectors are 
needed to compute, which we can take advantage of since the solution G is low-rank 
(see more details in section 3.1 in [37]). 
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7. Numerical experiments. All numerical experiments were performed on a 
machine with 2 Intel(R) Xeon(R) CPUs X5570, each with 4 cores, running at 2.93 
GHz. We simulated 100, 500 and 1000 rotations in the groups SO (2) and SO (3) 
respectively. The noise is added to the rotation ratio measurements according to the 



ER random graph model Q (n,p) (4.1 1 and the mo del (|5.1[ ) with small perturbations 
on "good" edge set in subsection |7. 1 1 and subsection |7.2[ respectively, where n is total 
number of rotations, and p is the proportion of good rotation ratio measurements. 

We define the relative error of the estimated Gram matrix G as 



(g,G)= G-G /\\G\\ , (7.1) 



RE 



and the mean squared error (MSE) of the estimated rotation matrices R\, . . . , R n as 



1 n II 2 
MSE =- V \\Ri - ORt , (7.2) 
n £ — ' II 

»=i 



where O is the optimal solution to the registration problem between the two sets of 
rotations . . . , R n } and j-Ri, . . . , Rn\ in the sense of minimizing the MSE. As 

shown in [28], there is a simple procedure to obtain both O and the MSE from the 
singular value decomposition of the matrix — ^™=i ^i^J • For each experiment with 
fixed d, n, p and k, we run 10 trials and record the mean of REs and MSEs. 

We compare LUD to EIG, SDP [27] (using the SDP solver SDPLR [5]). EIG and 



SDP are two algorithms to solve the least squares problem in (2.1 1 with equal weights 
using spectral relaxation and semidefinitc relaxation, respectively. LUD does not have 
advantage in the running time. In our experiments, the running time of LUD using 
ADM is about 10 to 20 times slower than that of SDP, and it is hundreds times slower 
than EIG. We will focus on the comparison of the accuracy of the rotation recovery 
using the three algorithms. 



7.1. El: Exact Recovery by LUD. In this experiment, we use LUD to recover 



rotations in SO (2) and SO (3) with different values of n and p in the noise model (4.1 ). 
Table [7TT] shows that when n is large enough, the critical probability where the Gram 
matrix G can be exactly recovered is very close to p c (2) pa 0.6237, p c (3) ps 0.6966. 

The comparison of the accuracy of the estimated rotations by EIG, SDP and LUD 
is shown in Tables |7.2|7.3| and Figure |7.1[ that demonstrate LUD outperforms EIG 
and SDP in terms of accuracy. 
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^\ p 

n 


0.9 


0.8 


0.7 


0.6 


0.5 


0.4 


100 


0.0002 


0.0004 


0.0007 


0.0007 


0.0613 


0.3243 


500 


0.0002 


0.0002 


0.0004 


0.0005 


0.0007 


0.1002 


1000 


0.0001 


0.0002 


0.0003 


0.0006 


0.0007 


0.0511 



(a) SO (2) 



^\ p 

n 


0.9 


0.8 


0.7 


0.6 


0.5 


0.4 


100 


0.0001 


0.0001 


0.0002 


0.0029 


0.1559 


0.4772 


500 


0.0001 


0.0001 


0.0001 


0.0002 


0.0011 


0.3007 


1000 


0.0001 


0.0001 


0.0001 


0.0002 


0.0007 


0.2146 



(b) SO (3) 



Table 7.1: The Relative Error (7.1) of the Gram matrix G obtained by LUD in El. 
The critical probability where the Gram matrix G can be exactly recovered is upper 
bounded by p c (2) ss 0.6237, p c (3) ~ 0.6966 when n is large enough. 



p 


0.7 


0.6 


0.5 


0.4 


0.3 


0.2 


EIG 


0.0064 


0.0115 


0.0222 


0.0419 


0.0998 


0.4285 


SDP 


0.0065 


0.0116 


0.0225 


0.0427 


0.1014 


0.3971 


LUD 


1.7e-07 


4.7e-08 


8.4e-05 


0.0043 


0.0374 


0.3296 



(a) n = 100 



p 


0.7 


0.6 


0.5 


0.4 


0.3 


0.2 


EIG 


0.0012 


0.0023 


0.0040 


0.0077 


0.0163 


0.0445 


SDP 


0.0012 


0.0023 


0.0041 


0.0078 


0.0164 


0.0440 


LUD 


6.4e-10 


5.5e-09 


9.6e-09 


6.3e-05 


0.0025 


0.0211 



(b) n = 500 



p 


0.7 


0.6 


0.5 


0.4 


0.3 


0.2 


EIG 


0.0006 


0.0011 


0.0020 


0.0037 


0.0080 


0.0207 


SDP 


0.0006 


0.0011 


0.0020 


0.0037 


0.0081 


0.0207 


LUD 


3.0e-10 


1.5e-09 


7.3e-09 


7.5e-06 


0.0010 


0.0084 



(c) n = 1000 



Table 7.2: MSE (|7_2j of the estimated rotations in SO (2) using EIG, SDP and LUD 
in El. The critical probability where the rotations can be exactly recovered is upper 
bounded by p c (2) ss 0.6237 when n is large enough. 
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p 


0.7 


0.6 


0.5 


0.4 


0.3 


0.2 


EIG 


0.0063 


0.0120 


0.0224 


0.0435 


0.0920 


0.3716 


SDP 


0.0064 


0.0122 


0.0233 


0.0452 


0.0968 


0.4040 


LUD 


1.0e-09 


6.4e-07 


4.1e-04 


0.0094 


0.0461 


0.2700 



(a) n = 100 



p 


0.7 


0.6 


0.5 


0.4 


0.3 


0.2 


EIG 


0.0012 


0.0023 


0.0041 


0.0080 


0.0164 


0.0442 


SDP 


0.0012 


0.0023 


0.0041 


0.0080 


0.0165 


0.0447 


LUD 


4.7e-ll 


1.8e-10 


2.1e-09 


0.0006 


0.0061 


0.0295 



(b) n = 500 



p 


0.7 


0.6 


0.5 


0.4 


0.3 


0.2 


EIG 


0.0006 


0.0011 


0.0020 


0.0037 


0.0079 


0.0208 


SDP 


0.0006 


0.0011 


0.0020 


0.0038 


0.0079 


0.0209 


LUD 


2.5e-ll 


2.4e-10 


8.0e-10 


O.OOOl 


0.0026 


0.0131 



(c) n = 1000 



Table 7.3: MSE (|7_2j of the estimated rotations in SO (3) using EIG, SDP and LUD 
in El. The critical probability where the rotations can be exactly recovered is upper 
bounded by p c (3) ~ 0.6966 when n is large enough. 
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Fig. 7.1: MSE (|7_2|) of the estimated rotations in SO (3) and SO (2) using EIG, SDP 
and LUD for different values of n and p in El. Exact Recovery (i.e. \og 10 (MSE) < 
—7) is achieved by LUD. 



7.2. E2: Stability of LUD. In this experiment, the three algorithms are used 



to recover rotations in SO (2) with different values of n, p in the noise model (5.1 ). 
In (5.1 1, the perturbed rotations Rij for the "good" edge set are sampled from a von 
Mises-Fisher distribution [5] with mean rotation Rf Rj and a concentration parameter 
k > 0. The probability density function of the von Mises-Fisher distribution for R^ 
is given by: 

/ (/,', ,:/,';' /,',./,•) = c (k) exp (/tTr ( /.'' /.', /.',.,) ) , 

where c (n) is a normalization constant. The parameters Rf Rj and 1/n are analogous 
to /i (mean) and a 2 (variance) of the normal distribution: 

1. RfRj is a measure of location (the distribution is clustered around Rj Rj). 

2. k is a measure of concentration (a reciprocal measure of dispersion, so 1/k is 
analogous to a 2 ). If k is zero, the distribution is uniform, and for small k, it 
is close to uniform. If k is large, the distribution becomes very concentrated 
about the rotation RfRj. In fact, as k increases, the distribution approaches 
a normal distribution in R^ with mean RfRj and variance 1/k. 

For arbitrary fixed small e > 0, we can choose the concentration parameter n large 



enough so that the condition (5.2 1 is satisfied. In fact, using Weyl integration formula 
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(A. 2 1, it can be shown that when k — > oo, 



Vax(||i2ij - RfRj 



(1 



IT K 



0(k 



-3/2\ 



Figures 7.2a 7.2b and 7.3 show that for different n, p and concentration parameter 



k, LUD is more accurate than EIG and SDP. 

fn addition we observe that when the concentration parameter k is as large as 100, 
which means the perturbations on the "good" edges are small, the critical probability 
p c for phase transition can be clearly identified around 0.5. As n decreases, the phase 
transition becomes less obvious. 
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Fig. 7.2: MSE |7_2j) of the estimated rotations in SO (2) using EIG, SDP and LUD 
for 100 and 500 rotations and different values of p and concentration parameter k in 
E2. LUD is stable when there are small perturbations on the "good" edge set. 
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Fig. 7.3: MSE of n = 100 estimated rotations in SO(2) as a function of k when p = 0.7 
using LUD, EIG and SDP. The recovery by LUD is stable to small perturbations on 
the "good" edges as indicated by the linear relationship of log(MSE) and log(re). 



E3: Real data experiments. We tried LUD for solvi ng the global alignment 
problem of 3D scans from the Lucy dataseiQ (see Figure 7.4). We are using a down- 
sampled version of the dataset containing 368 scans with a total number of 3.5 million 
triangles. Running the automatic Iterative Closest Point (ICP) algorithm [21] starting 
from initial estimates returned 2006 pairwise transformations. For this model, we 
only have the best reconstruction found so far at our disposal but no actual ground 
truth. Nevertheless, we use this reconstruction to evaluate the error of the estimated 
rotations. 

We apply the two algorithms LUD, EIG on the Lucy dataset since we observed 
SDP did not perform so well on this dataset. Although the MSEs are quite simi- 
lar (0.4044 for LUD and 0.3938 for EIG), we observe that the unsquared residuals 

Ri — Ri (i = 1,2,... ,368), where R\ is the estimated rotation, are more concen- 
trated around zero for LUD (Figure 7.5). Figure 7.6 suggests that the "bad" edges 



(edges with truly large measurement errors in the left subfigure of Figure 7.6) 
can be eliminated using the results of LUD more robustly, compared to that of EIG. 



We set the cutoff value to be 0.1 in Figure 7.6 for the estimated measurement errors 
obtained by LUD and EIG. Then 1527 and 1040 edges are retained from 2006 edges 
by LUD and EIG respectively, and the largest connected component of the sparsified 
graph (after eliminating the seemingly "bad" edges) has size 312 and 299 respectively. 
The 3D scans with the estimated rotations in the largest component are used in the 
reconstruction. The reconstruction obtained by LUD is better than that by EIG 



(Figure 7.7) 



7 Available from The Stanford 3-D Scanning Repository 



at 



graphics.stanford.edu/data/3Dscanrep/ 



|http://v 
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Fig. 7.4: The Lucy statue. 



EIG MSE=0.393B 




SDP MSE=0.4314 



LUD MSE=0.4044 



Fig. 7.5: Histogram of the unsquared residuals of EIG, SDP and LUD for the Lucy 
dataset. The errors by LUD are more concentrated near zero. 




fa- V " "'Vl 




Fig. 7.6: True and estimated rotation ratio measurement errors for the Lucy dataset. 
The measurements are sparse since only 2006 rotation ratios were measured for 368 
3-D scans. Left: the value at the (i,j)th element is \\Rij — RjRj\\ if it is non-empty, 
where i, j — 1,2,..., 368. Middle and Right: similar to Left except that the value at 



the (i, j)th element is Rij — RjR 



if it is non-empty. 
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Fig. 7.7: Front and back views of the reconstructions from the Lucy dataset. The left 
one is the best found reconstruction. The middle and right ones are obtained by LUD 
and EIG. 



8. Summary. In this paper we proposed to estimate the rotations using LUD. 
LUD minimizes a robust self consistency error, which is the sum of unsquared resid- 
uals instead of the sum of squared residuals. LUD is then semidefinite relaxed and 
solved by ADM. We compare LUD method to EIG and SDP methods, both of which 
are based on least squares approach, and demonstrate that the results obtained by 
LUD are the most accurate. When the noise in the rotation ratio measurements comes 
from ER random graph model G (n,p), we compute an upper bound p c of the phase 
transition point such that the rotations can be exactly recovered when p > p c . More- 
over, the solution of LUD is stable when small perturbations are added to "good" 
rotation ratio measurements. 
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The exact recovery result for the noise model (4.1 ) is actually not that surprising. 
In order to determine if the rotation measurement for a given edge is correct 

or noisy, we can consider all cycles of length three (triangles) that include that edge. 
There are n — 2 such triangles, and we can search for a consistent triangle by multi- 
plying the three rotation ratios and checking if the product is the identity rotation. 
If there is such a consistent triangle, then the measurement for the edge is al- 
most surely correct. The expected number of such consistent triangles is (n — 2)p 2 , 
so the critical probability for such an algorithm is p c = 0(l/y/n), which is already 
significantly better than our exact recovery condition for LUD for which the critical 
probability does not depend on n. In fact, by considering all cycles of length k > 3 
consisting of a given fixed edge (there are 0{n k ~ 2 ) such cycles), the expected number 
of consistent cycles is 0(n k ~ 2 p k ~ 1 ), therefore the critical probability is 0(l/n 1_e ) 
for any e > 0. Since exact recovery requires the graph of good edges is connected, 
and the ER graph is connected almost surely when p = 21 ° s " , the critical probabil- 
ity cannot be smaller than p = 21 ° s " . The computational complexity of cycle-based 
algorithms increases exponentially with the cycle length, but already for short cycles 
(e.g., k — 3) they are preferable to LUD in terms of the critical probability. So what 
is the advantage of LUD compared to cycle-based algorithms? The answer to this 
question lies in our stability result. While LUD is stable to small perturbations on 
the good edges, cycle-based algorithms are unstable, and are therefore not as useful 
in applications where small measurement noise is present. 

An iterative algorithm for robust estimation of rotations has been recently pro- 
posed in 12] for applications in computer vision. That algorithm aims to minimize 
the sum of geodesic distances between the measured rotation ratios and those derived 
from the estimated rotations. In each iteration, the algorithm sequentially updates 
the rotations by the median of their neighboring rotations using the Weiszfeld algo- 
rithm. We tested this algorithm numerically and find it to perform well, in the sense 
that its critical probability for exact recovery for the noise model (4.1) is typically 
smaller than that of LUD. However, the objective function that algorithm aims to 
minimize is not even locally convex. It therefore requires a very good initial guess 
which can be provided by either EIG, SDP or LUD. In a sense, that algorithm may 
be regarded as a non-convex refinement algorithm for the estimated rotations. There 
is currently no theoretical analysis of that algorithm due to the non-convexity of its 
underlying objective function. 

In the future, we plan to extend the LUD framework in at least two ways. First, we 
plan to investigate exact and stable recovery conditions for more general (and possibly 
weighted) measurement graphs other than the complete graph considered here. We 
speculate that the second eigenvalue of the graph Laplacian would play a role in 
conditions for exact and stable recovery, similar to the role it plays in the bounds for 
synchronization obtained in [3]. Second, the problem of synchronization of rotations 
is closely related to the orthogonal Procrustes problem of rotating n matrices toward 
a best least-squares fit [33]. Closed form solution is available only for n = 2, and the 
SDP relaxation for n > 3 was analyzed in [22]. The LUD framework presented here 
can be extended in a straightforward manner for the orthogonal Procrustes problem 
of rotating n matrices toward a best least-unsquared deviations fit, which could be 
referred to as the robust Procrustes problem. Similar to the measurement graph 
assumed in our theoretical analysis of LUD for robust synchronization, also in the 
Procrustes problem the measurement graph is the complete graph (all pairs of matrices 
are compared). 
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Appendix A. The value of c(d). Now let us study the constant c{d) such 
that c(d)I d = E ( p^Ejfp ) i where R is uniformly sampled from the rotation group 
SO(d). Due to the symmetry of the uniform distribution over the rotation group 
SO(d), the off-diagonal entries of E ( ^ j are all zeros, and the diagonal entries 

of E ( i|j^Zfl|| J are all the same as the constant c(d). Using the fact that 



h-R\\ = jTr((/ d -i?) (I d -Hy 



= jTr(2I d -R-R T ) 



= v/2Tr(/ rf -i?), 



we have 



1 (Tr(I d -R)\ 1 



E 



\^2Tr(I d -R)J V2d V V Kd >)> 

pvnprtatinn is trip T-Ta.flr mpfisiirp Trip fiinrti 



c{d) = d E {-Jl^Mi 

where the probability measure for expectation is the Haar measure. The function 
Tr (I d — R) is a class function, that is, / (R) is invariant under conjugation, 
lat for all 0,R<E SO(d) we have / (ORO T ) = f (R). Therefore E (/ (i?)) 



/ (R) = yTr (I d — R) is a class function, that is, / (R) is invariant under conjugation, 
meaning that for all O, R e SO(d) we have / (ORO T ) = f (R). Therefore E (/ (i?)) 
can be computed using the Weyl integration formula specialized to SO(d) (Exercise 
18.1-2 in U) as below: 



/ / cos#i — sin 6^ 
sin 6*i cos 9i 



>SO(2m+l) 



f(R)dR=-±-[ f 



xIlO 



\ 



■**-e*°>\ le 



V 

2 I _t& 



cos 6 m — sin 9 rn 
sin 6 m cos 6 m 



\ 



\ 



— e 



JJ|e**-l| 2 d0 1 ...d0 fl 



/ / cos 9\ — sin 6*i 



J SO(2m) A m - J[— K,n] m 



sm 0\ cos 6*1 



VV 



n 

i<3 



^ -e^\ 2 \e^ -e-^\ 2 )d9 1 ...d6 r 
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1 // 



(A.l) 



cos 9 m - sin 9 m 
sm9 m cos9 m J J 

(A.2) 



In particular, for d = 2 or 3 (m = 1), using the Weyl integration formula we obtain 
c(2) = (v/Tr(/ 2 -i?)) 

/ nr\c £) — cin /3 \ \ 

)d6 



2V2 

1 1 
2^2 ' 2tt 



I Vl - cos 6d6 
1 f" 

— / Vl - cos 8d9 (let a; = cos 9) 
2tt Jo 



cos v -sins 
sin 9 cos 9 



1 



2tt vi + £ 

V2/7T, 



da; 



and 



(3) = (Vli(/3-B)) 



1 



3V2 2tt 

2V2 r £ 

Jo 



Tr / a 




(l-cos0)d0 



cos 6»(1- cos 0) d(9 




3tt 
4V2 

~3tT 
8\/2 
~9tT 



d# (let a; = cos -) 



da: 



For d > 4, the computation of c (d) involves complicated multiple integrals. Now we 
study the lower and upper bound and the limiting value of c (d) for large d. 

Lemma A.l. — , 1 < c(d) < —}=, where I • I denotes the floor of a number. 

Proof. Using the fact that the square root function is concave, we obtain 
c(<fl = -^E( v /TVft-fl)) 

Vd 1 



V2d 



2d 



where the second equality uses the fact that Tr (R) = due to the symmetry of the 
Haar measure. 

Since d - 4 [d/2\ < Tr (R) < d, we have 



< Tr (I d - R) < 4 [d/2\ . 
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(A.3) 



In the range |a3, y/Tr (I d - R) > 



2y/\d/2\ 



Tr (I4 — R) due to the concavity of the 



square root function. Therefore we obtain 



c(d) = -^E(y/Tr(I d -R) 



> 



pE (Tr (I d — R)) 



2d^[d/2\ 

d _ 1 
2d^\dj2\ ~ 2^[d/2\ 



Remark. The upper bound of c (d) is very close to c (d) for d — 2, 3, and 4: 
^= (« 0.3536) < c(2) = ^ (« 0.4502) < i 



J (w 0.3536) <c(3) = ^ (« 0.4001) < 



2^ 



0.25 = - < 
4 



9tt 

c(4) (ra 0.3505) 



< 



1 

i 



2^/2 



« 0.4082 
(ra 0.3536) 



In fact we can proof the following lemma. 
Lemma A. 2. lim^oo V2dc(d) = 1. 
Proof. We have 



1 > v^dc (d) = — (Vtt (/, 



> 



^pf|Tr(i2)| < 



d~ di . 



(A.4) 



where the first inequality follows Lemma A.l Diaconis and Mallows [9] showed that 
the moments of the trace of R equal the moments of a standard normal variable for 
all sufficiently large n. In particular, when d — ¥ oo, the limit of Tr (R) has mean 
and variance 1. Therefore using Chebyshev inequality we get that 



(\Tt(R)\ 



< d< 



1 as d 



(A.5) 



Hence let d — > oo in (A.4), we obtain lim^oo \/2dc (d) = 1. □ 



Appendix B. Proof of Lemma 4.10, Here we aim to prove that the limiting 
spectral density of the normalized random matrix ~7=y-D is Wigner's semi-circle, 
where D is defined in ( |4.20 1. The following definitions and results will be used. 

• The Cauchy (Sticltjcs) transform of a probability measure \x and moments 

{ TO fe}feli is g iven b y 



dfi (x) 



1 

7 + E5tT' ^ eC < Im W> 



0. 



fc=l 



The density g (x) can be recovered from G^ by the Stielgjes inversion formula: 



9{x) 



-— lim Im (Gu (x + it)) 

7T e->0 
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The Wigner semicircle distribution /i 0i(T 2 centered at with variance a 2 is the 
distribution with density 



, v J V % a ^ x ' 2 if x £ [-2(7, 2a] , 
I otherwise. 

The Cauchy transform of the semicircle distribution /i CT 2is given by 



z - Vz 2 - 4fT 2 

We now state a theorem by Girko, which extends the semicircle law to the case of 
random block matrices, and show how, in particular, this follows for the random 
matrix— ?=> D. 

Theorem B.l. (Girko, 1993 in 110]) Suppose the nd x nd matrix M is composed 
of d x d independent random blocks Mij, i, j = 1, 2, . . . , n which satisfy 

Mij = Mf., E {Mij) = O d , E \\M i:j f < oo. (B.l) 

Suppose also that 



JC||iWij|' 2 

and that Lindeberg's condition holds: for every e > 



sup max VE||M„|| 2 < oo (B.2) 

n i— 1,2,... ,n ^- — ' 



lim max Vfi (\\M l3 \\ 2 *(|| My ||> e ) ) = 0, (B.3) 

where X denotes an indicator function. Let \\ > . . . > \ nc [ be the eigenvalues of M 
and let fj, n (x) = %{\i<x) be the eigenvalue measure. Then for almost any x, 

\t i "a ( x ) ~ F-n \ — > as n oo a.s., 

where F n (x) are the distribution functions whose Cauchy /Stieltjes transforms are 
given by -]K Y17=i Tr(Ci (zj), where the matrices Ci (z) satisfy 

a (z) = ( E lzl d -Ej^MijCj (z) MfA J , i = 1,2, . . . ,n. (B.4) 



The solution Cj (z), i = 1, 2, . . . , n to (B.4) exists and is unique in the class of analytic 
d x d matrix functions C (z) for which Im (z) Im (C (z)) < 0, Im (z) 7^ 0. 

Now we apply the result of the theorem on the matrix W = ^7=3= D whose d x d 
blocks satisfy 



w.p. p, 
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for i 7^ j, where Rij = Rj±. Notice that W is a random symmetric matrix (Wy = W?) 
with i.i.d off-diagonal blocks. From the definition of c (d) it follows that E (Wy) = 0. 
Also, || WV, || is finitely bounded, that is, 



< 

n — 



Therefore, all the conditions (condtion (B.1|-(B.3|) of the theorem are satisfied by 
the matrix W. Before we continue, we need the following 

Lemma B.2. We have E (p^ffp) = ^h, and E ( p^Ef[r ) = c (d) Id- 
Proof. Using the symmetry of the Haar measure on SO(d), we can assume that 

E ( p^Zrw^ = a ^d, and E f p^Ejf|j J = bl d , where a and b are constants depending 

on d. Therefore we have 



d V V Pd-fll 



2 



1 E / Trd.-i?) 



and 



lf \Tr {{I d -R)(I d -R) T 

^(T^-Rl 
d \2Tr (I d -R) 
1 

2d' 



We claim that Cj (z) = (z) Id, where h (z) is a function of z. In fact, 

n 



-'i' 



= (1 - p) " w E ( (w^kl - c w '*) (iK^ii - c{i) '« 

= (l-p)h (z) Q J d - 2c (d) 2 7 d + c (d) 2 I d 
= (l-p)h(z) Q- C (d) 2 )l d , 

34 



where the fourth equality uses Lemma B.2 From (B.4) we obtain 

-l 



//(:')/,,= ( :./,; - (I - /»)/'(:) ( i- c(df ) I,, 



that is, 



which reduces to 



h(z) = 



c(dy)h(zy -zh(z) + 1 = 1, 



*±^-4(l-p)(i-c(d) 2 ) 
2(l-p)fi-c(d) 



The uniqueness of the solution now implies that the Cauchy transform of F n (x) is 



1 - 

— X>(<7« (*)) = &(*) = 



z- x lz2-4(l-p)^-c(d) 

:(l-p)(i- c (d) 2 ) 



where we select the — branch according to the properties of the Cauchy trans- 
form. This is exactly the Cauchy transform of the semicircle law with support 



-2^(l-p)(i-c(d) 2 ), 2^(1 -p) (|-c(d) 2 ) 



Hence, for large n the eigenval- 
ues of -D are distributed according to the semicircle law with support 



-2 X (1-p) 



-- c{df ) (n - 1),2 J (1-p) 



Thus the limiting spectral density of J— ^- D is Wigner's semi-circle. Since any sym- 
metric matrix can be decomposed into a superposition of a positive semidefinite matrix 
and a negative definite matrix (this follows immediately from the spectral decompo- 
sition of the matrix), the matrix D can be decomposed as 

D = D+ + D_, 

where D + !>= and D_ -< 0. Clearly, 



ir>l 



l^+l| 2 + P-|| 2 : 



and since the limiting spectral density of _ D is Wigner's semi-circle that is sym- 
metric around 0, we have 

1 



D 



From the law of large numbers we get that ||D|| is concentrated at 

2\ 



\D\\ « (1 -p)n(n- 1)E 



-c(d)id 



| Id — Ri 

(1 -p) n (n - 1) I 1 + c (d) 2 d - 2r ( d) 3 [ Tr 

(1 -p)n(n - 1) (l -c(dfdj 
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1? 



Hence, 

\\D+\\ 2 « P-|| 2 « \{l -p)n{n- 1) (l - c(d) 2 d) . 



Appendix C. Proof of Lemma |4.11[ We have already shown that the gain 
from Q is at most Op (\/nlogn) X)"=i IIQiill- Now we will show that the gain from 
Q is less than the sum of Op (^nlogn) Tr (T) and the loss in the off-diagonal entries 
of Aij for € £ C! specifically speaking, for p > | 



i=l 

<O p (^/nlogn) Tr (T) + 7— 
V / (p- 



Op ( A/n log j 



E 

(i,j)e£ c 



A 



ofi I 



(C.l) 



which is equivalent to prove Lemma 4.11 

A matrix is skew-symmetric if X 1 = —X. Every matrix X can be written as the 
sum of a symmetric matrix and a skew-symmetric matrix: X — x+x — ^ X ~2 ■ We 
will call the first term the symmetric part and the second term the skew-symmetric 
part. Recall that Q 1 is a matrix with identical columns = Q^). We denote the 
symmetric part and the skew-symmetric part of as 



Qli + Qi 



T^ii ~f~ Pit 



(C.2) 



and 



Q 



i,ss Qa Qi 



(C.3) 



respectively. Then we have 

\\QU\ = 



Q\'° + Q 



l,ss 



< 



Q 



l.s 



Q 



l,ss 



(C.4) 



Later we will see the skew-symmetric part Qj' ss is exactly what's causing troubles. 
Before dealing with the trouble, let's first handle the symmetric part Q\ ,s . 
For the symmetric part Q\' s we have 

n 1 n 

Ek 1,s ^Eii t «+^h 



/ n n \ 

<2 Eii T »n+Eii p «ii 

\i=l i=l / 

1 / n n 

= 2 Em+ E t « 

\i=l i=l 
n n 

<Eii T ^ii^E Tr (^) = Tr ( T )' 



i=l 
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(C.5) 



where the second equality uses Lemma |4.3| and the third inequality uses the fact that 
T u fc= and that for any matrix X ^ 0, \\X\\ < Tr (X). 

Let us consider now the skew- symmetric part Q] ,ss . For any index p, q G {1,2, d}, 
consider J2"=i \Q] ,SS (p> q) Ij where Q]' ss (jp, q) is the (p, g)th entry of the matrix Q i ' 5S . 
Without loss of generality and for the purpose of simplicity, let us assume that 



is largest when p — 1 and q = 2 (it won't be largest on diagonal, 



because diagonal entries have to be 0). Now we know 

n n 

\\Ql' ss \\< d 2 J2\Ql' ss M\- (C6) 

i=l i=l 

This is just because ' ss has d 2 entries. Thus later we will just care about Q\' ss (1,2) 
and Q]' ss (2, 1). We define the matrices Q M , Q 1 '^, Q 2 < s , and Q 2 ' ss as 

Qlf = Q) -. Q]f = QY\ Q:: = Q)-. Qrr = Ql " for all i and j, (C.7) 

and the matrix Q s = Q 1 - 5 + Q 2 - s and Q ss = Q 1 - 83 + Q 2 > ss . Thus it is easy to see 
Q = Q s + Q ss . Restrict the matrices P, Q, T and so on to the good entries and 
obtain P c , Q Cl T c and so on (which means, for example, when £ £ c , set the 

(i,j)'s block P c (i,j) = 0; when G £ c , keep P c (i,j) = P(i,j). Then we can 

prove the following lemmas. 

Lemma C.l. Let the vectors s 1: s 2 G be defined as in (4-6), that is, 

Bi = (1, 0,0,. ..0, 1,0,0,. ..,0, ,1,0,0,...,0) T , 

s 2 = (0, 1,0,... 0,0,1,0,. ..,0, ,0,1,0,...,0) T . 

And define the vectors ai,a 2 G R nd as following: 

{1 if m = d ■ i + I and Q] ,ss (p, q) > 0, 
-1 ifm = d-i + landQ 1 i ' ss (p,q)<0, for I = 1,2. (C.8) 
otherwise, 

Then for p > 1/2, we have the following inequality 

( 1 \ " 

sf (Pc + Qc) a 2 - (P c + Q c ) &1 > 2n - - - O p (Vlogn/n) J ^ |Q-' ss (1, 2 

(C9) 



i=l 



Proof. Firstly, note that for any matrix X G M. ndxnd , sfXa.2 — s^X&i is the sum 
of the differences between Xij (1, 2) and Xij (2, 1). Thus it is easy to verify that the 
symmetric matrix P c and the symmetric parts Q\' s and Q 2 ' s in Q s c contribute to 
the left handside of (C.9), that is, 

sf (P c + Q c ) a 2 - s% (P c + Q c ) ai = sf Qf a 2 - s^Qf ai = 2sf Qf a 2 , (CIO) 

where the second equality uses the fact that — s^Q* s ai = s^Q^ s a 2 due to the skew- 
symmetry of the submatrices of Q S L S . 
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(a) M a 



(b) M b 



ra 



>Vl/2 

(c) Contribution from Mi,. 



Ql' ss (1,2) 



Fig. C.l: Two n x n matrices. The matrix M a is defined as M a (i,j) 
if £ £ c , otherwise M a (i,j) = 0. The matrix Mb is defined as Mb(i,j) — 

~Q}' SS (1, 2) sign (o)' 88 (1, 2)) if € £ c , otherwise M 6 (i,j) = 0. With the as- 

sumption that there is an integer i such that Q\' ss > when i < i and Q^' ss < 
when i > io, M a and Mf, have non-negative regions with sign "+" and non-positive 
regions with sign "-" as shown in sub-figure (a) and (b). The right hand-sides of 
(C.ll) and (C.12) can be represented as sums of the entries (i, j) € £ c of M a and Mb 



respectively. When taking a summation of right hand-sides of (C.ll) and (C.12), the 
entries in the "-" regions of Mb and those corresponding entries in M a cancel each 
other out. Thus the contribution to the sum comes from the entries at € £ c in 
"+" regions of Mb and the corresponding parts in M a . The sub-figure (c) sketches the 
worst case for the contribution by entries at € £ c . The blue region represents the 
fact that each row has roughly p fraction of edges in £ c . Since one of the "+" regions 
has to be wider than 1/2 fraction, there must be at least roughly p — 1/2 fraction of 
edges in £ c per row in that region, which is represented by the orange region. Thus 



the sum of right hand-sides of (C.ll) and (C.12) is at least twice the sum of entries 



in the orange region, where all entries are positive. 



Now we just need to focus on s^qss^ _ s t (qM s -j_ Q 2 c ' ss ) sl 2 - For each of the 
submatrices of Ql ,ss and Q^ ss , we only look at their entry at index (1, 2). Due to the 
definition in (C.7) and (C.8) we have 



sfQ^ s a 2 = £ Q\f (1,2) sign (1,2) 



E 



(1,2) (C.ll) 
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and 



Si Q c a 2 



E 



Q*f s (1,2) sign {q]' ss (1,2) 
E -Qj'" (1,2) sign (qV- (1,2 



(C.12) 



Without loss of generality and for the purpose of simplicity, we assume there is an 
integer zq such that Q\' ss > when i < io and Q t ' ss < when i > iq. Then the right 
handsides of (C.ll) and (C.12) can be represented as sums of the entries € £ c of 
the n x n matrices in Figure |C.la| and Figure [C. lb| respectively. Apparently in some 
parts, even if there is a edge in £ c things will cancel (these are the parts that are 
marked negative in Figure C.lbl. But in the parts that are marked positive in Figure 



C.lb[ if there is an edge in £ c , things will not cancel and will indeed contribute to 
sf Q s c s &2- Notice that every row has pn + Op {\/n log n) entries in £ c . Also, one of the 
regions marked with + in the second figure has to be wider than n/2. There must b e 

at least p — 1/2 — Op ( y/log n/n\ fraction of edges in £ c per row there (Figure C.lb ). 



The contribution of these entries in both matrices in Figure |C.la| and Figure |(J.lb| is 



1/2 - Op (Vfog^)) nJ2 \Ql' SS (1,2) (< s^Q-'aa) 



(C.13) 



where we use the fact that 



J2 or (i,2) = - E 



(l,2)=£|Qr(l,2)|/2. 



g= s (i,2)>o 



Q= s (l,2)<0 



See Figure C.lc the contribution (C.13) comes from the orange part. Combine (C.10) 
and (C.13) together and we obtain (C.9). □ 
Lemma C.2. 



n 



E 



\Ql'"\\ < 



p-l/2- 
Proof. We know that 



OpUlogn/i 



n 5^ 



A 



off 



- Tr (T) 



Sf {Pc + Qc+ T c ) SL 2 

-sj A c a 2 - s^A c ai 
' > |A y (1,2)| + 



<2 



E 
E 



- (P c + Q c + T c ) ai 

E l A «(2,l)l 
(i,i)e£ c 



A' 



off I 



(C.14) 



(C.15) 



where A° ff is obtained by restricting A»j on the off-diagonal entries. In addition, we 
have 

-sf T c a 2 < ]T \ T v (1.2)1 < J E ( T " X ) + T -» ( 2 ' 2 )) = ? Tr ( T ) ' 



1,3 



3D 



where the second inequality uses the fact that T )p 0, and similarly we have s^T c ai < 
fTr(T), thus we obtain 



sf T c a 2 + T c ai < nTr (T) 



(C.16) 



Combining flC.6| ) (lC.9| ), ( |C 15[) an d (|C_16|) together we get flC 14[) □ 
Using (|C.4|), (|C.5|) and 1C.14I) , we reach the conclusion in (IC.l I. 
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